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1 . (Previously Presented) A method of automatically creating a dictionary for clustering text 
documents comprising: 

inputting a maximum dictionary size; 

determining a . frequency of each word in each of said documents; 

creating a dictionary of most frequently occurring words in said documents as limited by 
said maximum dictionary size, such that said dictionary contains less than all words in said 
documents; 

after creating said dictionary, determining a frequency of phrases in each of said 

documents that contain only words in said dictionary; 

adding most frequently occurring phrases to said dictionary; and 

outputting said most frequently occurring words and said most frequently occurring 

phrases as said dictionary, wherein said dictionary size limits the number of words and phrases 

maintained in said dictionary. 

2. (Previously Presented) The metiiod in claim 1 , wherein said determining a frequency of 
each word comprises: 

removing punctuation and case from said documents; 
removing stop words from said document; 
replacii^ words in said documents .with synonyms; 
removing duplicale words from said documents; 

adding remaining words to said dictionary as limited by said maximum dictionary size; 
detemiining said frequency of each word remaining in said dictionary; and 
removing words below a fiiequency level from said dictionary. 

3. (Original) The method in claim 2, fiirther comprising inputting one or more of said stop 
words, said synonyms, and said fi^uency level. 

4. (Previously Presented) The method in claim I, wherein said determining a frequency of 
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phrases comprises: 

removing punctuation and case jfrom said documents; 

removing stop words from said document; 

replacing words in said documents with synonyms; 

adding said phrases in each of said documents that contain only word5 in said dictionary 
to said dictionary; 

determining said frequency of said phrases remaining in said dictionaiy; and 
removing phrases below a frequency level from said dictionary. 

5. (Original) The method in claim 4, fijrther comprising inputting one or mors of said stop 
words, said synonyms, and said frequency level. 

6. (Currently Amended) A method of automatically creating a dictionary for clustering text 
documents comprising: 

inputting a maximum dictionary size; 

performing a first pass for each of said documents comprising: 

determining a frequency of each word in each of said documents; and 
creating a dictionary of most frequently occurring words in said documents as 
limited by said maximum dictionary size, such that said dictionaiy contaitis less than all words in 
said documents; 

after perfbnning said first pass, performing a second pass for each of said documents 
comprising: 

determinmg a frequency of phrases in each of said documents that contain only 
words in said dictionary; and 

adding most frequently occumng phrases to said dictionary; and 
outputting said most frequently occurring words and said most frequently occiining 
phrases as said dictionary, wherein said dictionary size limits the number of words and phrases 
maintained in said dictionary. 
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7. (Previously Presented) The method in claim 6, wherein said determining a frequency of 
each word comprises: 

removing punctuation and case from $aid documents; 
removing stop words from said document; 
replaciiog words in said documents with synonyms; 
removing duplicate words from said documents; 

adding remaining words to said dictionary as limited by said maximum dictionary size; 
detennining said frequency of each word remaining in said dictionary; and 
removing words below a frequency level from said dictionary. 

8. (Original) The method in claim 7, fiirtlier comprising inputting one or more of said stop 
word$> said synonyms, and said frequency leveL 

9. (Previously Presented) The method in claim 6, wherein said deterniining a frequency of 
phrases comprises: 

removing punctuation and case from said docxunents; 
removing stop words from said document; 
replacing words in said documents with synonyms; 

adding said phrases in each of said documents that contain only words in said dictionary 
to said dictionary; 

determining said frequency of said phrases remaining in said dictionary; and 
removing phrases below a fi^uency level from said dictionary, 

10. (Original) Hie method in claim 9, further comprising inputting one or more of said stop 
words, said synonyms, and said fi^uency level. 

1 1 . (Currently Amended) A program storage device readable by machine, tangibly 
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embodying a program of instructions executable by the machine to perform a method of 
automatically creating a dictionary for clustering text documents, said method comprising: 
inputting a maximiun dictionary size; 

detemiining a frequency of each word in each of said documents; 

creating a dictionary of most frequently occurring words in said documents as limited by 
said maximum dictionary size, such that said dictionary contains less than all words in said 
documents; 

after creating said dictionary, determining a frequency of phrases in each of said 

documents that contain only words in said dictionary; 

adding most frequently occurring phrases to said dictionary; and 

outputting said most frequently occurring words and said most frequently occurring 

phrases as said dictionary, wherein said dictionary size limits the number of words and phrases 

maintained in said dictionary. 

12. (Previously Presented) A program storage device as in claim 1 1 , wherein said 
determining a frequency of each word comprises: 

removing punctuation and case from said documents; 

removing stop words from said document; 

replacing words in said documents with synonyms; 

removing duplicate words from said documents; 

adding remaining words to said dictionary; 

determining said frequency of each word remaining in said dictionary; and 
removing words below a frequency level from said dictionary. 

13, (Original) A program stomge device as m claim 12, further comprising inputting one or 
more of said stop words, said synonyms, and said frequency level. 

14. (Previously Presented) A program storage device as in claim wherein said 
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detennining a frequency of phrases comprises : 

removing punctuation and case from said documents; 
removing stop words from said document; 
replacing words in said documents with synonyms; 

adding said phrases in each of said documents that contain only words in said dictionary 
to said dictionary; 

detennining said frequency of said phrases remaining in said dictionar}^ and 
removing phrases below a frequency level from said dictionary. 

1 5. (Original) A program storage device a$ in claim 14, further comprising inputting said stop 
words. 

16. (Original) A program storage device as in claim 14, further comprising inputting said 
synonyms. 

1 7. (Original) A program storage device as in claim 1 4, further comprising inputting said 
frequency leveL 
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